feat(ffi): plumb simplify, expressions, reverse_expr, and documentation through FFI_WindowUDF#22705
feat(ffi): plumb simplify, expressions, reverse_expr, and documentation through FFI_WindowUDF#22705Brijesh-Thakkar wants to merge 7 commits into
simplify, expressions, reverse_expr, and documentation through FFI_WindowUDF#22705Conversation
…hrough FFI_WindowUDF Closes apache#22332 Add documentation, expressions, simplify, and reverse_expr fn pointers to FFI_WindowUDF struct Implement producer-side wrappers for each new fn pointer Wire ForeignWindowUDF trait methods to call the fn pointers Add FFI_ExpressionArgs in a new expression_args.rs sibling module to carry Vec<Arc<dyn PhysicalExpr>> across the boundary Add FFI_ReversedUDWF enum (#[repr(C, u8)]) to represent the three ReversedUDWF variants stably simplify executes on the producer side and returns a serialized Expr; the consumer closure deserializes and returns the result Unit tests (local-bypass + forced-foreign) for all four methods Integration test for documentation in ffi_udwf.rs Note: ForeignWindowUDF::simplify() uses DefaultLogicalExtensionCodec internally; UDWFs requiring a custom codec are a known follow-up. API change: FFI_WindowUDF struct layout modified — target main only, no backport.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR extends the UDWF FFI surface to expose additional WindowUDFImpl capabilities (documentation, expressions, simplify, and reverse), and adds tests to validate cross-FFI behavior.
Changes:
- Add new FFI entry points for
documentation,expressions,simplify, andreverse_expronFFI_WindowUDF - Introduce an
FFI_ExpressionArgswrapper for passingExpressionArgsacross FFI - Add/extend tests to cover the new UDWF FFI behaviors
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| datafusion/ffi/src/udwf/mod.rs | Adds new UDWF FFI function pointers + wrappers, plus ForeignWindowUDF implementations for the new trait methods |
| datafusion/ffi/src/udwf/expression_args.rs | New FFI-stable container to pass ExpressionArgs across the FFI boundary |
| datafusion/ffi/tests/ffi_udwf.rs | Adds a test ensuring UDWF documentation survives the FFI wrapping/unwrapping path |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| impl TryFrom<ExpressionArgs<'_>> for FFI_ExpressionArgs { | ||
| type Error = DataFusionError; | ||
|
|
||
| fn try_from(args: ExpressionArgs) -> Result<Self, DataFusionError> { |
| unsafe extern "C" fn expressions_fn_wrapper( | ||
| udwf: &FFI_WindowUDF, | ||
| args: FFI_ExpressionArgs, | ||
| ) -> SVec<FFI_PhysicalExpr> { | ||
| unsafe { | ||
| let inner = udwf.inner(); | ||
| let args = ForeignExpressionArgs::try_from(args).unwrap(); | ||
| let expressions = inner.expressions((&args).into()); | ||
| expressions.into_iter().map(FFI_PhysicalExpr::from).collect() | ||
| } | ||
| } |
| fn expressions( | ||
| &self, | ||
| expr_args: datafusion_expr::function::ExpressionArgs, | ||
| ) -> Vec<Arc<dyn PhysicalExpr>> { | ||
| unsafe { | ||
| let args = FFI_ExpressionArgs::try_from(expr_args).unwrap(); | ||
| (self.udf.expressions)(&self.udf, args) | ||
| .into_iter() | ||
| .map(|e| Arc::<dyn PhysicalExpr>::from(&e)) | ||
| .collect() | ||
| } | ||
| } |
| /// user defined signatures, which will in turn call coerce_types to be called. This | ||
| /// call should be transparent to most users as the internal function performs the | ||
| /// appropriate calls on the underlying [`WindowUDF`] | ||
| pub documentation: unsafe extern "C" fn(udwf: &Self) -> *const Documentation, |
| fn simplify(&self) -> Option<WindowFunctionSimplification> { | ||
| let udf = self.udf.clone(); | ||
| Some(Box::new(move |wf, info| { |
| pub documentation: unsafe extern "C" fn(udwf: &Self) -> *const Documentation, | ||
|
|
||
| pub expressions: unsafe extern "C" fn( | ||
| udwf: &Self, | ||
| args: FFI_ExpressionArgs, | ||
| ) -> SVec<FFI_PhysicalExpr>, | ||
|
|
||
| pub simplify: unsafe extern "C" fn( | ||
| udwf: &Self, | ||
| window_function: SVec<u8>, | ||
| schema: WrappedSchema, | ||
| ) -> FFI_Result<FFI_Option<SVec<u8>>>, | ||
|
|
||
| pub reverse_expr: unsafe extern "C" fn(udwf: &Self) -> FFI_ReversedUDWF, |
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
e0cacfd to
027ea4b
Compare
|
@timsaucer Please review it and suggest if any changes are required |
|
Thank you for the PR! I'm a bit busy for the moment, but I'll try to get to it as soon as I can. |
Ok sure, No Problem |
|
Hi @timsaucer, just checking in on this PR whenever you have time. I've addressed the review feedback and all checks are passing. Please let me know if there's anything else I should update. Thanks! |
timsaucer
left a comment
There was a problem hiding this comment.
Thank you for the PR. There are a couple of things I'm not sure why we're doing them in the way we are, so I could use some help understanding what problem they're solving. Specifically the comments about simplify() but there are others.
| name: Circular Dependency Check | ||
| runs-on: ubuntu-latest | ||
| container: | ||
| image: amd64/rust |
There was a problem hiding this comment.
Why are these changes necessary? They seem unrelated to the current PR.
| pub struct ForeignExpressionArgs { | ||
| input_exprs: Vec<Arc<dyn PhysicalExpr>>, | ||
| input_fields: Vec<FieldRef>, | ||
| } |
There was a problem hiding this comment.
Why have a ForeignExpressionArgs? We aren't implementing a trait, which is where that pattern is needed. This seems like an unnecessary layer. Is it because ExpressionArgs requires a borrow and we need an owned struct?
If that's the case then I think we should follow the naming convention like ForeignReturnFieldArgsOwned because these expression arguments are not actually foreign at this point. If you're using an agent and it generated this naming convention due to the FFI skill in our repository, then it would be good to update the skill to reflect the fact that it missed this convention.
| /// call should be transparent to most users as the internal function performs the | ||
| /// appropriate calls on the underlying [`WindowUDF`] | ||
| /// Pointer lifetime is tied to the inner Arc; null = None | ||
| pub documentation: unsafe extern "C" fn(udwf: &Self) -> *const Documentation, |
There was a problem hiding this comment.
I was going to comment on this but it looks like Copilot has identified the same problem. can you address the above comment?
| /// Serializes WindowFunction via DefaultLogicalExtensionCodec; | ||
| /// returns None variant if no simplification; only called when has_simplify=true |
There was a problem hiding this comment.
We definitely do not want to use the default logical extension codec here.
| fn simplify(&self) -> Option<WindowFunctionSimplification> { | ||
| if !self.udf.has_simplify { | ||
| return None; | ||
| } | ||
|
|
||
| let udf = self.udf.clone(); | ||
| Some(Box::new(move |wf, info| { | ||
| let codec = datafusion_proto::logical_plan::DefaultLogicalExtensionCodec {}; | ||
|
|
||
| // To serialize the window function | ||
| let expr = Expr::WindowFunction(Box::new(wf)); | ||
| let protobuf = datafusion_proto::logical_plan::to_proto::serialize_expr( | ||
| &expr, &codec, | ||
| ) |
There was a problem hiding this comment.
I'm unsure why we're doing this serialization to cross the boundary.
Which issue does this PR close?
Closes #22332
Rationale for this change
FFI_WindowUDFwas silently dropping producer overrides of four defaultedWindowUDFImplmethods. Any plugin that overrodeexpressions(),simplify(),reverse_expr(), ordocumentation()would have thoseoverrides ignored on the consumer side, with the trait defaults running
instead. This could change window-function semantics (e.g.
expressions()selects which physical expressions reach the
PartitionEvaluator) orsilently suppress planner-level optimizations (
simplify,reverse_expr).What changes are included in this PR?
documentation— newunsafe extern "C"fn pointer returning*const Documentation(null =None); consumer reconstructs viaptr.as_ref()expressions— new fn pointer takingFFI_ExpressionArgsandreturning
SVec<FFI_PhysicalExpr>; a new sibling moduleudwf/expression_args.rsprovidesFFI_ExpressionArgs/ForeignExpressionArgsto carryVec<Arc<dyn PhysicalExpr>>andVec<FieldRef>across the boundarysimplify— closures cannot cross FFI, so simplification executeson the producer side: the consumer serializes the
WindowFunctiontoprotobuf bytes, calls the fn pointer, and deserializes the returned
Expr; the producer wrapper deserializes, calls the innersimplifyclosure, and returns the serialized result
reverse_expr— newFFI_ReversedUDWFenum (#[repr(C, u8)],matching the existing ABI-stable enum pattern in this crate) with
variants
Identical,NotSupported, andReversed(FFI_WindowUDF);the
Reversedvariant uses the round-trip downcast optimization toavoid double-wrapping when the inner UDF is already a
ForeignWindowUDFKnown limitation:
ForeignWindowUDF::simplify()currently usesDefaultLogicalExtensionCodecinternally. UDWFs whose expressions requirea custom codec to round-trip are not yet supported through the simplify
path and are tracked as a follow-up.
Are these changes tested?
Yes. For each of the four methods:
datafusion/ffi/src/udwf/mod.rscover both thelocal-bypass path (marker IDs match, downcasts to concrete type) and
the forced-foreign path (
mock_foreign_marker_idset, full FFI pathexercised)
datafusion/ffi/tests/ffi_udwf.rscoversdocumentationacross the crate boundaryAre there any user-facing changes?
FFI_WindowUDFstruct layout has changed — this is an ABI-breaking changefor anyone linking against the struct directly. The PR targets
mainonlywith no backport, per the
datafusion-fficrate convention for layoutchanges.